Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zeroize: zeroize entire capacity of Vec #341

Merged
merged 3 commits into from Jan 30, 2020
Merged

zeroize: zeroize entire capacity of Vec #341

merged 3 commits into from Jan 30, 2020

Conversation

aticu
Copy link
Contributor

@aticu aticu commented Jan 28, 2020

Re-implement #180, but without any additional required trait bounds, which is why #180 was reverted in #276.

// - The total size of the slice must be no larger than `isize::MAX`.
// This is true, because `Vec` never allocates more than `isize::MAX` bytes.
let extra_capacity = unsafe {
core::slice::from_raw_parts_mut(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to create a slice of uninitialized memory, to give it to the Zeroize impl on [Z], which iterates over it and performs ptr::write_volatile.

It might make sense instead to perform ptr::write_volatile here directly, followed by atomic_fence(), to avoid ever constructing an uninitialized slice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively this was recently stabilized: rust-lang/rust#68234

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite see how constructing a *mut [T] helps. It's main intent is for data structures storing raw pointer or a NonNull<T> without lifetime information but here it would still require somehow manually iterating over the individual element pointers to do a volatile write. In particular, the implementation of Z::zeroize(&mut self) can still not be called without UB.

Copy link
Member

@tony-iqlusion tony-iqlusion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks ok now, but will run it by a few others before merging

@aticu
Copy link
Contributor Author

aticu commented Jan 29, 2020

Thanks! By the way, I just noticed that a similar thing might make sense for String. If you want, I could also implement that.

Oh and I just noticed the CONTRIBUTING.md file. Should I add another commit adding myself to the AUTHORS.md?

@tony-iqlusion
Copy link
Member

Should I add another commit adding myself to the AUTHORS.md?

Yes, that'd be good.

I just noticed that a similar thing might make sense for String. If you want, I could also implement that.

Sure, but I'd prefer a separate PR so it doesn't block this one.

let extra_capacity_start = unsafe { self.as_mut_ptr().add(self.len()) as *mut u8 };
let extra_capacity_len = self.capacity().saturating_sub(self.len());

for i in 0..(extra_capacity_len * core::mem::size_of::<Z>()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a simpler option than casting to a *mut u8, which is using mem::zeroed to produce the all-zero byte pattern for Z, and ptr::write_volatile to write it, e.g.

for i in 0..extra_capacity_len {
    unsafe { ptr::write_volatile(extra_capacity_start.add(i), mem::zeroed()); }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but it would result in undefined behavior, when Z is not a type where an all-zero pattern is a valid value. However now that I think about it again, it wouldn't make much sense to implement Zeroize for such a type. I'll adjust that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually on second thought, if it were implemented that way, it would be possible to invoke undefined behavior from safe code. For example this would then result in undefined behavior:

use core::num::NonZeroU8;
use zeroize::Zeroize;

struct NonZero(NonZeroU8);

impl Zeroize for NonZero {
    fn zeroize(&mut self) {
        self.0 = NonZeroU8::new(1).unwrap();
    }
}

fn main() {
    let mut vec = vec![NonZero(NonZeroU8::new(2).unwrap())];
    vec.clear();
    // undefined behavior: this would create a `NonZeroU8` with a
    // memory-representation of all-zeroes while zeroing the
    // uninitialized memory
    vec.zeroize();
}

This may not be a very useful implementation of Zeroize, but it would be a way to invoke undefined behavior in safe code. The other version does not have this problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would the uninitialized memory be exposed to safe code?

Regardless, the net effect is the same: you are writing zeroed bytes to the excess capacity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would the uninitialized memory be exposed to safe code?

It doesn't need to. According to the doc of mem::zeroed:

There is no guarantee that an all-zero byte-pattern represents a valid value of some type T. For example, the all-zero byte-pattern is not a valid value for reference types (&T and &mut T). Using zeroed on such types causes immediate undefined behavior because the Rust compiler assumes that there always is a valid value in a variable it considers initialized.

Which links to a section in the docs about undefined behavior:

It is the programmer's responsibility when writing unsafe code to ensure that any safe code interacting with the unsafe code cannot trigger these behaviors. unsafe code that satisfies this property for any safe client is called sound; if unsafe code can be misused by safe code to exhibit undefined behavior, it is unsound.

The same page also states that

Rust code is incorrect if it exhibits any of the behaviors in the following list.

  • [...]
  • Invalid values for a type with a custom definition of invalid values. In the standard library, this affects NonNull<T> and NonZero*.

The example above would therefore be both unsound and incorrect, which is possible from safe code. The values do not need to be accessible by safe code to create the problem here.

With the method which zeroes the memory using u8 this would not be the case, as u8 is guaranteed to be able to hold a value of 0.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a better solution?

The implementation, as currently written in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still seems like 6 of one, half dozen of another to me, except looping a byte-at-a-time and doing pointer arithmetic around core::mem::zeroed::<NonNull<T>>() is more complex and potentially slower.

It seems the crux of this is...

Memory behind a raw pointer can hold any value without any problem, because Rust does not consider it to be initialized. The problem only arises when an invalid value of type T exists, which Rust considers initialized, which mem::zeroed() does.

...but we're talking about a buffer which is defined as being:

What does a normal Vec::with_capacity call initialize it to?

Nothing, it does not create a value of type T though and just leaves the memory completely uninitialized.

The contract of Vec is always to initialize this capacity in some way before reading from it.

It seems the claim is it might be considered initialized by the Rust compiler when it is uninitialized, but before its API permits any reads, it will be initialized again.

Can either of you give a concrete example of a tractable problem which is possible given the contract of Vec to always (re)initialize this memory which would not occur with pointer-based zeroing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are doing to the excess buffer of the vector is and never was the problem. You are indeed free to do with the extra capacity of bytes whatever you want. But not however you want. The code that writes invalid or uninitialized values into it must still be UB-free and executing mem::zeroed::<T> is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can either of you give a concrete example of a tractable problem which is possible given the contract of Vec to always (re)initialize this memory which would not occur with pointer-based zeroing?

The problem is not the Vec or its memory, the problem is simply that the code could call mem::zeroed<NonNull<()>>, which is UB. The compiler might make optimizations that would crash the program if such a value ever exists. This would be the same problem in every program, even without a Vec being involved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I get it now. Thank you for the explanation.

aticu and others added 2 commits January 29, 2020 18:54
I, Niclas Schwarzlose, hereby agree to license all contributions I make
to this project under the terms of the Apache License, Version 2.0.
@tony-iqlusion tony-iqlusion merged commit 9a2610e into iqlusioninc:develop Jan 30, 2020
@tony-iqlusion tony-iqlusion mentioned this pull request Sep 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants